Knowledge discovery by accuracy maximization.
نویسندگان
چکیده
Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.
منابع مشابه
A Knowledge Management Approach to Discovering Influential Users in Social Media
A key step for success of marketer is to discover influential users who diffuse information and their followers have interest to this information and increase to diffuse information on social media. They can reduce the cost of advertising, increase sales and maximize diffusion of information. A key problem is how to precisely identify the most influential users on social networks. In this pape...
متن کاملCluster Based Cross Layer Intelligent Service Discovery for Mobile Ad-Hoc Networks
The ability to discover services in Mobile Ad hoc Network (MANET) is a major prerequisite. Cluster basedcross layer intelligent service discovery for MANET (CBISD) is cluster based architecture, caching ofsemantic details of services and intelligent forwarding using network layer mechanisms. The cluster basedarchitecture using semantic knowledge provides scalability and accuracy. Also, the mini...
متن کاملA data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing)
Training and adaption of employees are time and money consuming. Employees’ turnover can be predicted by their organizational and personal historical data in order to reduce probable loss of organizations. Prediction methods are highly related to human resource management to obtain patterns by historical data. This article implements knowledge discovery steps on real data of a manufacturing pla...
متن کاملKnowledge Discovery from Area-Class Resource Maps: Capturing Prototype Effects*
111is paper presents a knowledge discovery approach to extracting knowledge from area-class resource maps. Prototype theory forms the basis of the approach which consists of two major components: (1) a scheme for organizing knowledge used in categorizing geographic entities which allows for the modeling of indeterminate boundaries and non-uniform memberships within categories; and (2) a data mi...
متن کاملSubverting Knowledge Discovery in Adversarial Settings
Knowledge-discovery technologies are assessed by how well they model real-world situations, but little attention has been paid to how robust they are when the data they use has been deliberately manipulated. We show that it is straightforward to subvert some mainstream prediction technologies (decision trees, support vector machines), and clustering technologies (expectation-maximization and ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 111 14 شماره
صفحات -
تاریخ انتشار 2014